Adaptive Mixing of Auxiliary Losses in Supervised Learning
نویسندگان
چکیده
In many supervised learning scenarios, auxiliary losses are used in order to introduce additional information or constraints into the objective. For instance, knowledge distillation aims mimic outputs of a powerful teacher model; similarly, rule-based approaches, weak labeling is provided by functions which may be noisy approximations true labels. We tackle problem combine these principled manner. Our proposal, AMAL, uses bi-level optimization criterion on validation data learn optimal mixing weights, at an instance-level, over training data. describe meta-learning approach towards solving this objective, and show how it can applied different scenarios learning. Experiments number rule denoising domains that AMAL provides noticeable gains competitive baselines those domains. empirically analyze our method share insights mechanisms through performance gains. The code for at: https://github.com/durgas16/AMAL.git.
منابع مشابه
Learning Longer-term Dependencies in RNNs with Auxiliary Losses
We present a simple method to improve learning long-term dependencies in recurrent neural networks (RNNs) by introducing unsupervised auxiliary losses. These auxiliary losses force RNNs to either remember distant past or predict future, enabling truncated backpropagation through time (BPTT) to work on very long sequences. We experimented on sequences up to 16 000 tokens long and report faster t...
متن کاملSemi-Supervised Learning with Conditional Harmonic Mixing
Recently graph-based algorithms, in which nodes represent data points and links encode similarities, have become popular for semi-supervised learning. In this chapter we introduce a general probabilistic formulation called ‘Conditional Harmonic Mixing’, in which the links are directed, a conditional probability matrix is associated with each link, and where the numbers of classes can vary from ...
متن کاملAdaptive Sparseness for Supervised Learning
The goal of supervised learning is to infer a functional mapping based on a set of training examples. To achieve good generalization, it is necessary to control the “complexity” of the learned function. In Bayesian approaches, this is done by adopting a prior for the parameters of the function being learned. We propose a Bayesian approach to supervised learning, which leads to sparse solutions;...
متن کاملImproving Semi-Supervised Learning with Auxiliary Deep Generative Models
Abstract Deep generative models based upon continuous variational distributions parameterized by deep networks give state-of-the-art performance. In this paper we propose a framework for extending the latent representation with extra auxiliary variables in order to make the variational distribution more expressive for semi-supervised learning. By utilizing the stochasticity of the auxiliary var...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Proceedings of the ... AAAI Conference on Artificial Intelligence
سال: 2023
ISSN: ['2159-5399', '2374-3468']
DOI: https://doi.org/10.1609/aaai.v37i8.26176